LINKED DATA IS THE FUTURE IN ACADEMIC LIBRARIES 


Linked Data is the Future of Description and Access in Academic Libraries 


Oksana V. Moshynska 


Marshall School of Business, University of Southern California 


LINKED DATA IS THE FUTURE IN ACADEMIC LIBRARIES 


Abstract 

The world’s steady transitioning online has been expedited by the ongoing COVID-19 pandemic. 
Libraries need to swiftly move to the online environment, and the semantic web and linked data 
seem to be a perfect solution. Linked data will help to connect structured data and their 
relationships on the Internet. By linking data, academic libraries will be able to achieve their 
open access mission. Once all library materials, including open access publications are linked on 
the semantic web, these resources, representing all human knowledge, will become inter- 
connected. This will enable the description, access, and sharing of human knowledge without 
any barriers, making such knowledge machine-readable and ready for processing by artificial 
intelligence. 

Keywords: linked data, semantic web, metadata, open access, public access, institutional 
repositories, digital archive, information storage and retrieval, academic libraries, university 


faculty 


LINKED DATA IS THE FUTURE IN ACADEMIC LIBRARIES 


Linked Data is the Future of Description and Access in Academic Libraries 

With the entire world transitioning online, the future of all libraries cannot be imagined 
without linked data; this is especially true for academic libraries. Linked data is a set of 
processes to connect structured data and their relationships and describe publishing, sharing, and 
connecting data on the web (Baker et al., 2011; Park & Kipp, 2019). In recent years, linking data 
has become one of the top priorities for libraries (Zhu, 2019). In addition to text materials, there 
are image, video, and audio resources, as well as large datasets and raw data (e.g., genome, 
geospatial) that need to be linked to authors/creators, their affiliations, and published articles 
(Fernandez & Tilton, 2018). 

Transitioning from the world wide web to the semantic web is not possible without 
linking data. In fact, linked data is “authority control for the semantic web” (Zhu, 2019, p. 216) 
and “the heart of what semantic web is all about” (W3C, n.d.). The semantic web may be defined 
as an environment in which all existing data could be pulled by simple queries and are machine- 
readable (W3C, n.d.). Linking data is becoming more important-so important, in fact, that new 
terms have been introduced such as “linked open data” and “library linked data” (Baker et al., 
2011, para 1). 

One of the greatest benefits of linking data is that it will help librarians to meet one of 
their ultimate goals, described in the United Nations Universal Declaration of Human Rights, 
specifically, “the right to know and to be informed” (Ford, 2018, p. 267). It will also help to 
reduce and ultimately eliminate inequality in information and digital literacy. Linked data is not 
an easy task that can be achieved overnight. Librarians need to examine all available metadata 
tools and techniques used to make published research visible, discoverable, and usable 


(Fernandez & Tilton, 2018). 


LINKED DATA IS THE FUTURE IN ACADEMIC LIBRARIES 


Linked Data and Metadata Tools 

Many libraries currently use MARC schema; however, MARC cannot accommodate 
linking data and is not suitable for linking data needs (Zhu, 2019). Therefore, another schema as 
well as describing formats are needed, a schema that supports formats required for linking data. 
In fact, the transition to the linked data has already begun; the MARC schema is now considered 
a legacy system while libraries transitioning to the linked data. 

In the MARC schema, data are registered by the context of the record, identifiers are text 
strings, and relationships are presented as notes. In linked data, things are linked as independent 
from their context of records and identifiers and relationships are recorded as uniform resource 
identifiers (Zhu, 2019, p. 218). Several describing formats may serve as the MARC’s 
replacement with the best suitable candidates being URIs (uniform resource identifier), RDF 
(resource description framework), SPARQL (protocol and RDF query language), BIBFRAME, 
OWL, and linking data in WorldCat (Fernandez & Tilton, 2018; W3C, n.d.). 

BIBFRAME that is supported by Library of Congress and Zepheira is helpful during the 
transition from the MARC records over to linked data. However, BIBFRAME heavily relies on 
Wikipedia, which is biased (Wikipedia, n.d. —a). Wikipedia itself makes a disclaimer that their 
information “should not be considered a definitive source in and of itself’ (Wikipedia, n.d. —b, 
para 1). Wikipedia is a crowdsourcing-based platform and its records are created by ordinary 
people who have different education, interests, believes, and biases. If there are any introduced 
errors, these may take “days, weeks, months, or even years” to correct (para 1). In addition, 
many references have dead or broken links, making these sources unreliable. 

The most promising resource type for the semantic web and thus linking needs is 


resource description framework (RDF) as it is schema-neutral. The RDF is based on finding 


LINKED DATA IS THE FUTURE IN ACADEMIC LIBRARIES 


common attributes in other resource types with the purpose to connect to them (Taylor & 
Joudrey, 2018, p. 293). It is the only schema that could potentially connect all other existing 
schemas and could be completed without the need for re-coding of existing records. The process 
includes creation of metadata in RDF, storage, design of URIs, and interlinking them to other 
URIs as well as to common fields in other datasets (Zhu, 2019, p. 220). The latter is to make 
more records visible and discoverable. Lincoln (2015) believes the SPARQL is useful in 
translation of complicated graphs to simple tables that could be readable by the Excel software 
and suggests that its combination with RDF makes it a powerful tool in linking data. In addition, 
RDA entities are transferrable to BIBFRAME, except one of the core entities “expression” that 
does not have an equivalent in BIBFRAME (Zapounidou et al., 2019, p. 280). According to the 
managing director (Hennellly, n.d.), the RDA toolkit will make linking a lot easier as links could 
be built to not only to the MARC and other schemas records, but may stand on its own and be 
linked to the RDA rules using RDA URIs. This promising tool may allow fast and smooth 
linking of all resource data. 

There are the factors requiring special attention when linking data such as human 
(Schilling, 2012, p. 8), privacy, security (Kirrane et al., 2018, p. 153), infrastructure, policy- 
related issues, and process integrity (Harron et al, 2017, p.7). Additional resources are also 
required for these as well as for data quality assurance (Schilling, 2012, p. 8), correcting typos, 
linking to outdated sources, correcting relationships, fixing errors created during the machine- 
reading entry process (Harron et al, 2017, p. 7). To summarize, to properly link, the following 
four requirements described by Berners-Lee (2016) need to exist: 1) URIs for things, 2) HTTP 


for look up, 3) RDF or SPARQL for description, and 4) links for connecting to other URIs. 


LINKED DATA IS THE FUTURE IN ACADEMIC LIBRARIES 


Despite the need for additional human and infrastructure resources, the end product-global 
network of all human knowledge-is invaluable for libraries and the humankind. 
Linked Data in Academic Libraries and Open Access 

While linking of all library resources is an ultimate goal to create their discoverability on 
the semantic web (Zhu, 2019, p. 215), academic libraries tasked with the implementation of open 
access cannot do it successfully without the ability to link data. 

Academic librarians work hard to make all published research results open to the public. 
Open access is defined as immediate and unlimited public’s access to all published research 
results. The Open Access Movement (Budapest Open Access Initiative, n.d.) has been around for 
almost twenty years, however, it is yet to reach its full potential. The reason for it is that both 
librarians and academic researches face various barriers. The ultimate mission of open access is 
not only to achieve access to all published research materials but also to “open science” 
(Piazzini, 2020) and “open education” (Smith & Dickson, 2017). These would be achieved by 
allowing everyone to have unrestricted rights to access, read, save, print, and reuse research 
results immediately upon publication (Crawford, 2011; Smith & Dickson, 2017). The estimated 
numbers of closed or non-open access articles varies from a quarter to three quarters published 
articles and these numbers depend on a discipline (Fruin, 2019; Mikki, 2017; O’ Hanlon, et al., 
2020). 

Gold and green open access routes are closest to the ultimate open access goal as though 
costly for authors, these are free for users to access (Finlay, 2019; Kelly, 2013; Sotudeh & 
Estakhr, 2018). In attempt to meet the requirements of the open access mission, librarians created 
the green route by developing institutional repositories where authors could upload their articles 


(Bedord, 2018, p. 63). Canadian Association of Research Libraries (n.d. —a) called an 


LINKED DATA IS THE FUTURE IN ACADEMIC LIBRARIES 


institutional repository as a “digital archive of an institution’s intellectual output” (n.d. -a). Ways 
to locate articles “populated with content metadata” (Finlay, 2019, p. 6) in institutional 
repositories included DOIs (digital object identifiers) (DOI, n.d.) and open uniform resource 
locators (URLs), with latter one being accepted as the National Information Standards 
Organization (NISO) standard (p. 7). However, Finlay (2019) found that researchers preferred to 
upload their articles elsewhere and not in institutional repositories. This was because these 
researchers believed their articles stored in institutional repositories remained undiscoverable in 
there. Instead, researchers uploaded their articles in ResearchGate, SciHub, and academic social 
networks (Finlay, 2019). By doing this, they were able to reach larger auditorium. The downside 
of this so-called black open access route was that this sharing practice was illegal. 

In June 2020, the Canadian Association of Research Libraries (n.d. -b), realizing the 
limitations of current institutional repositories, have started a new initiative that would allow 
connecting all existing repositories into one central repository. This great initiative, however, 
will only work if all data are linked. I used the article written by Zhang and Watson (2017), the 
University of Saskatchewan’s academic librarians as an example. This study of physical sciences 
researchers funded by the Canadian Institutes of Health Research identified that 87%-91% 
publications were closed publications. This open access (and thus open to the public) article for 
some time was undiscoverable on the web. It would not come up in my searches unless I checked 
for it on the University of Saskatchewan’s website, in the research archive called HARVEST 
created on the DSpace platform (University of Saskatchewan, n.d. -a). However, once the URI 
(http://hdl.handle.net/10388/8089) was generated for this record (and relationships were created), 
this article became discoverable on the web. A simple Google search brought up this article. This 


article’s characteristics also linked this article to other valuable resource materials. 


LINKED DATA IS THE FUTURE IN ACADEMIC LIBRARIES 


While open data are more about the legality to openly share data, linked data are about 
the technical side of the process (Baker at al., 2011). Once an open access article is created to 
follow the linked data lifecycle, it becomes discoverable. This leads to increased visibility of 
published articles, expanded collaboration among researchers, improved research productivity, 
and growing number of citations. Once all library materials including open access publications 
are linked to the semantic web, these resources representing the entire human knowledge will be 
machine-readable and available for processing by artificial intelligence (Zhu, 2019, p. 215). 
Once data are open and inter-connected, academic libraries will be able to achieve its open 


access mission to enable access and sharing of knowledge without any barriers. 


LINKED DATA IS THE FUTURE IN ACADEMIC LIBRARIES 


References 

Baker, T., Bermes, E., Coyle, K., Dunsire, G., Isaac, A., Murray, P., Panzer, M., Schneider, J., 
Singer, R., Summers, E., Waites, W., Young, J., & Zeng, M. (2011). Library linked data 
incubator group final report. Retrieved October 22, 220, 
from https://www.w3.org/2005/Incubator/Ild/XGR-lld-20111025/ 

Bedord, J. (2018). Where can you work with an MLIS? Extending your career reach. In K. 
Haycock, and M. Romaniuk. The portable MLIS: Insights from the experts. (2nd ed.; pp. 
69-82). Libraries Unlimited. Amazon.com 

Berners-Lee, T. (2006). Linked data. W3C. Retrieved October 26, 2020, from 


https://www.w3.org/DesignIssues/LinkedData.html 


Budapest Open Access Initiative. (n.d.). Retrieved October 3, 2020, from 
https://www.budapestopenaccessinitiative.org/read 

Canadian Association of Research Libraries. (n.d. -a). Repositories in Canada. 
https://www.carl-abre.ca/advancing-research/institutional-repositories/repos-in-canada/ 

Canadian Association of Research Libraries. (n.d. -b). Canadian repository community call: 
Repository work during the pandemic: New roles for staff in unusual times. 
https://www.carl-abrc.ca/mini-site-page/july2020-repositories-community-call/ 

Crawford, W. (2011). Open access: What you need to know now. American Library Association. 
ProQuest Ebook Central. https://ebookcentral.proquest.com 

DOL. (n.d.). The DOI system. https://www.do1.org/ 


Fernandez, P., & Tilton, K. (2018). Applying library values to emerging technology: Decision- 


LINKED DATA IS THE FUTURE IN ACADEMIC LIBRARIES 10 


making in the age of open access, maker spaces, and the ever-changing library. 
Association of College and Research Libraries. 
https://ebookcentral.proquest.com/lib/socal/reader.action?docID=5287734 

Finlay, E. (2019.) Library provision of intellectual access to open access journal articles, The 
Serials Librarian, 77:1-2, 6-14, doi: 10.1080/0361526X.2019.1628161 

Ford, B.J. (2018). LIS professionals in a global society. In K. Haycock, & M. Romaniuk. The 
portable MLIS: Insights from the experts, (2nd ed.; pp. 267-276). Libraries Unlimited. 
Amazon.com 

Fruin, C. (2019, January 30). Open access: Myths, facts, actions. Summary of Proceedings. 
72nd Annual Conference of the American Theological Library Association, Indianapolis, 
IN, United States, June 13-16, 2018. 289-294. 
https://doi.org/10.31046/proceedings.2018.47 

Harron, K., Dibben, C., Boyd, J., Hjern, A., Azimaee, M., Barreto, M.,& Goldstein, H. (2017). 
Challenges in administrative data linkage for research. Big Data & Society. 1-12. 


https://doi.org/10.1177/205395 1717745678 


Hennelly, J. (n.d.). RDA toolkit: Do-it-yourself: Direct links to RDA instructions. 
https://www.rdatoolkit.org/examples 

Joudrey, D. N., & Taylor, A. G. (2018). The organization of information. (4th ed., pp. 291-368). 
Libraries Unlimited. Amazon.com 

Kelly, J. (2013). Green, gold, and diamond: A short primer on open access. 
https://www.jasonmkelly.com/jason-m-kelly/2013/01/27/green-gold-and-diamond-a- 


short-primer-on-open-access 


Kirrane, S., Villata, S., & d’Aquin, M. (2018). Privacy, security and policies: A review of 


LINKED DATA IS THE FUTURE IN ACADEMIC LIBRARIES 11 


problems and solutions with semantic web technologies. Semantic Web, 9. 153-161. 
doi: 10.3233/SW-180289 

Lincoln, M. (2015). Using SPARQL to access Linked Open Data. The Programming Historian, 
4. https://doi.org/10.46430/phen0047 

Mikki, S. (2017). Scholarly publications beyond pay-walls: Increased citation advantage for open 
publishing. Scientometrics, 113,1529-1538. 
https://doi-org.libproxy2.usc.edu/10.1007/s11192-017-2554-0 

O’Hanlon, R., McSweeney, J., & Stabler, S. (2020). Publishing habits and perceptions of open 
access publishing and public access amongst clinical and research fellows. Journal of the 
Medical Library Association, 108(1), 47-58. https://doi.org/10.5195/jmla.2020.751 


Park, H., & Kipp, M. (2019). Library linked data models: Library data in the Semantic Web. 


Cataloging & Classification Quarterly, 57:5, 261-277, 
doi: 10.1080/01639374.2019.1641171 

Piazzini, T. (2020). Open access as a new paradigm. An inevitable evolution? JLIS.it : Italian 
Journal of Library and Information Science, 11(3), 99-109. 
https://doi.org/10.4403/jlis.it-12631 

Schilling, V. (2012, September 25). Transforming library metadata into linked library data. 
American Library Association, 
http://www.ala.org/alcts/resources/org/cat/research/linked-data 

Smith, K. L., & Dickson, K. A. (2017). Open access and the future of scholarly communication: 
Implementation. Rowman & Littlefield. 


https://ebookcentral.proquest.com/lib/socal/reader.action?docID=4730830 


Sotudeh, H., & Estakhr, Z. (2018). Sustainability of open access citation advantage: The case of 


LINKED DATA IS THE FUTURE IN ACADEMIC LIBRARIES 


Elsevier’s author-pays hybrid open access journals. Scientometrics, 115(1), 563-576. 
https://doi.org/10.1007/S11192-018-2663-4 

University of Saskatchewan. (n.d. -a). HARVEST: University of Saskatchewan’s research 
archive. https://harvest.usask.ca/ 

University of Saskatchewan. (n.d. -b) Open Access: OA repositories. 


https://libguides.usask.ca/open_access/repositories 


W3C. (n.d.). Linked data. Retrieved October 26, 2020, from 


https://www.w3.org/standards/semanticweb/data 


Wikipedia. (n.d. —a). Ideological bias on Wikipedia. Retrieved October 18, 2020, from 


https://en.wikipedia.org/wiki/Ideological_bias_on_ Wikipedia 


Wikipedia. (n.d. —b). Wikipedia: Wikipedia is not a reliable source. Retrieved September 27, 
2020, from https://en.wikipedia.org/wiki/Wikipedia: Wikipedia_is_not_a_ reliable source 

Zapounidou, S., Sfakakis, M., & Papatheodorou, C. (2019). Mapping derivative relationships 
from RDA to BIBFRAME 2. Cataloging & Classification Quarterly, 57(5), 278—308. 
https://doi.org/10.1080/01639374.2019.1650152 

Zhang, L., & Watson, E.M. (2017). Measuring the impact of gold and green open access. The 
Journal of Academic Librarianship, 43(4), 337-345. 
https://doi-org.libproxy2.usc.edu/10.1016/j.acalib.2017.06.004 

Zhu, L. (2019). The future of authority control: Issues and trends in the linked data 


environment. Journal of Library Metadata, 19(3-4), 215-238. 


https://doi.org/10.1080/19386389.2019. 1688368 


12 


